-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237
Enable GPU exection of atm_divergence_damping_3d via OpenACC #1237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested this today for the merged branch (develop+atmosphere/port_divergence_damping_3d) versus the base branch (develop), the results matched!
However, I have some comments to address:
@@ -2599,6 +2599,13 @@ subroutine atm_divergence_damping_3d( state, diag, mesh, configs, dts, edgeStart | |||
rdts = 1.0_RKIND / dts | |||
coef_divdamp = 2.0_RKIND * smdiv * config_len_disp * rdts | |||
|
|||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | |||
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, specZoneMaskEdge, & | |||
!$acc theta_m, cellsOnEdge) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specZoneMaskEdge
and cellsOnEdge
are invariant fields (the won't change after init). I believe your PR should make sure these variables are handled in atm_dynamics_init
and shouldn't copyin or delete these fields in atm_divergence_damping_3d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the convention we've been following so far? I assumed the routine to be ported has localized data movement clauses. I might also need to rework my other PRs if this is the case. Thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow up.. Talked to @mgduda, and got this cleared up. Will make changes to these invariant fields. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that the convention we've been following so far?
I've been following that pattern, it was set in #1176.
I've been doing things similarly to what may happen here - do all the data movement that is needed locally for the function being ported, and then separate out those invariant fields and handle them in the atm_dynamics_{init,finalize}
routines. Typically these invariant fields are part of the mesh
var_struct as described in the src/core_atmosphere/Registry.xml.
!$acc theta_m, cellsOnEdge) | ||
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]') | ||
|
||
!$acc parallel async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async
could be dangerous here, I don't believe acc exit data copyout(ru_p)
will wait for this kernel to finish before beginning to copy.
I think it would be best to remove this async
since we don't need it (yet) and it could create issues for the kernels that come next and depend on ru_p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove the async
. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this async, but wanted to note that there's another one in atm_advance_scalars_work
. Is that fine for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this async, but wanted to note that there's another one in atm_advance_scalars_work. Is that fine for now?
I think we're good to leave that async
clause for now, it has a matching wait
directive/clause later on.
I've addressed the review comments, and also checked the output restart file for bit-for-bit reproducibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | ||
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m) | ||
!$acc end data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this !$acc end data
clause needed? (That's an actual, not a rhetorical, question!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed because of the acc data present
, which is currently written as a structured data region. Structured regions in Fortran OpenACC (parallel
and data
) require an acc end xxx
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gdicker1 Thanks for that explanation! The !$acc data present
directive didn't register when I was scanning through the code changes.
@abishekg7 Perhaps it would be better to follow the pattern in other routines that have been ported, and to avoid the use of structured data directives for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to add default(present)
clauses to the !$acc parallel
directives, that would obviate the need for the !$acc data present(specZoneMaskEdge, cellsOnEdge)
directive I would think.
!$acc enter data copyin(ru_p, rtheta_pp, rtheta_pp_old, theta_m) | ||
MPAS_ACC_TIMER_STOP('atm_divergence_damping_3d [ACC_data_xfer]') | ||
|
||
!$acc parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than the !$acc data present
directive above on line 2611, it might be simpler and more robust to just add a default(present)
clause to this parallel
directive.
@gdicker1 Does this align with your understanding of best practices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best practices would be neutral on this, whatever works best for our code and development. However, I agree that default(present)
could be simpler and require less of us.
We would have to ask Pranay or other MPAS v7 OpenACC developers why they have similar function-level acc data present ...
directives in their port. Speculation here: I think enclosing the kernels in a structured data region that ensures all data is present prevents the kernels from each checking their data - an optimization concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding the default(present)
for now, but noting the potential concern re. performance.
!$acc end parallel | ||
|
||
MPAS_ACC_TIMER_START('atm_divergence_damping_3d [ACC_data_xfer]') | ||
!$acc exit data copyout(ru_p) delete(rtheta_pp, rtheta_pp_old, theta_m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of clarity, I would suggest either having separate directives for copyout
and delete
, or at least splitting the copyout
and delete
clauses onto separate lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@abishekg7 The code changes look good to me! I'd only suggest reworking the commit history; then I think we'll be all set. |
This commit enables the GPU execution of the atm_divergence_damping_3d subroutine using OpenACC directives for the data movements and loops. A new timer, 'atm_divergence_damping_3d [ACC_data_xfer]', has been added for host- GPU data transfers in this subroutine. This port follows these considerations: - The data transfers for the invariant fields have been moved to mpas_atm_dynamics_init and finalize. - Explicitly adding gang, worker and vector level parallelism to parallel constructs in atm_compute_vert_imp_coefs_work. - The parallel constructs use default(present) clauses to avoid implicit data movements by the compiler. This change also requires dereferencing the pointers to nCellsSolve and nVertLevels, in order to avoid runtime error messages such as the following: FATAL ERROR: data in PRESENT clause was not found on device 1:name=ncellssolve
3b82420
to
7db74f8
Compare
Cleaned up the commit history. 1 commit seemed sufficient. Also, tested the rebased version in limited area mode and restart files are still bit identical. |
@abishekg7 one change, and I think this is good.
Could you change "have been moved"? It implies such data movement was in the |
This PR enables the GPU execution of atm_divergence_damping_3d subroutine.
Tested with a real and idealized test cases.